113 research outputs found
MegDet: A Large Mini-Batch Object Detector
The improvements in recent CNN-based object detection works, from R-CNN [11],
Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly
come from new network, new framework, or novel loss design. But mini-batch
size, a key factor in the training, has not been well studied. In this paper,
we propose a Large MiniBatch Object Detector (MegDet) to enable the training
with much larger mini-batch size than before (e.g. from 16 to 256), so that we
can effectively utilize multiple GPUs (up to 128 in our experiments) to
significantly shorten the training time. Technically, we suggest a learning
rate policy and Cross-GPU Batch Normalization, which together allow us to
successfully train a large mini-batch detector in much less time (e.g., from 33
hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone
of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st
place of Detection task
Mitigating Label Biases for In-context Learning
Various design settings for in-context learning (ICL), such as the choice and
order of the in-context examples, can bias the model's predictions. While many
studies discuss these design choices, there have been few systematic
investigations into categorizing them and mitigating their impact. In this
work, we define a typology for three types of label biases in ICL for text
classification: vanilla-label bias, context-label bias, and domain-label bias
(which we conceptualize and detect for the first time). Our analysis
demonstrates that prior label bias calibration methods fall short of addressing
all three types of biases. Specifically, domain-label bias restricts LLMs to
random-level performance on many tasks regardless of the choice of in-context
examples. To mitigate the effect of these biases, we propose a simple bias
calibration method that estimates a language model's label bias using random
in-domain words from the task corpus. After controlling for this estimated bias
when making predictions, our novel domain-context calibration significantly
improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The
gain is substantial on tasks with large domain-label bias (up to 37% in
Macro-F1). Furthermore, our results generalize to models with different scales,
pretraining methods, and manually-designed task instructions, showing the
prevalence of label biases in ICL.Comment: Accepted to ACL 202
- …